11 research outputs found
Quasi-stationary distributions as centrality measures of reducible graphs
Random walk can be used as a centrality measure of a directed graph. However,
if the graph is reducible the random walk will be absorbed in some subset of
nodes and will never visit the rest of the graph. In Google PageRank the
problem was solved by introduction of uniform random jumps with some
probability. Up to the present, there is no clear criterion for the choice this
parameter. We propose to use parameter-free centrality measure which is based
on the notion of quasi-stationary distribution. Specifically we suggest four
quasi-stationary based centrality measures, analyze them and conclude that they
produce approximately the same ranking. The new centrality measures can be
applied in spam detection to detect ``link farms'' and in image search to find
photo albums
Tensor approach to mixed high-order moments of absorbing Markov chains
Moments of absorbing Markov chain are considered. First moments and non-mixed second moments are determined in classical textbooks such as the book of J. Kemeny and J. Snell ``Finite Markov Chains''. The reason is that the first moments and the non-mixed second moments can be easily expressed in a matrix form. Since the representation of mixed moments of higher orders in a matrix form is not straightforward, if ever possible, they were not calculated. The gap is filled by this paper. Tensor approach to the mixed high-order moments is proposed and compact closed-form expressions for the moments are discovered
Monte Carlo Methods for Top-k Personalized PageRank Lists and Name Disambiguation
We study a problem of quick detection of top-k Personalized PageRank lists.
This problem has a number of important applications such as finding local cuts
in large graphs, estimation of similarity distance and name disambiguation. In
particular, we apply our results to construct efficient algorithms for the
person name disambiguation problem. We argue that when finding top-k
Personalized PageRank lists two observations are important. Firstly, it is
crucial that we detect fast the top-k most important neighbours of a node,
while the exact order in the top-k list as well as the exact values of PageRank
are by far not so crucial. Secondly, a little number of wrong elements in top-k
lists do not really degrade the quality of top-k lists, but it can lead to
significant computational saving. Based on these two key observations we
propose Monte Carlo methods for fast detection of top-k Personalized PageRank
lists. We provide performance evaluation of the proposed methods and supply
stopping criteria. Then, we apply the methods to the person name disambiguation
problem. The developed algorithm for the person name disambiguation problem has
achieved the second place in the WePS 2010 competition
Des approches pour le calcul du PageRank fondées sur les méthodes de Monte Carlo et chaßnes de Markov
NICE-BU Sciences (060882101) / SudocSudocFranceF
Pagerank based clustering of hypertext document collections
International audienceClustering hypertext document collection is an important task in Information Retrieval. Most clustering methods are based on document content and do not take into account the hyper-text links. Here we propose a novel PageRank based clustering (PRC) algorithm which uses the hypertext structure. The PRC algorithm produces graph partitioning with high modularity and coverage. The comparison of the PRC algorithm with two content based clustering algorithms shows that there is a good match between PRC clustering and content based clustering